Causal inference

Murray Logan

10 July 2025

Background

Background

“Correlation does not imply causation”



Karl Pearson (1900)

Structural Causal Modelling (SCM)

Structural Causal Modelling (SCM)


Step 1. Create a conceptual model of the system of interest

Step 2. Choose a statistical test

Step 3. Test the consistency of your conceptual model with the data

Step 4. Identify biases (confounding, overcontrol, collider)

Step 5. Test for causality with the appropriate statistical test

Step 6. Repeat steps 3-5 for each cause of interest

Step 7. Repeat steps 1-6 for other conceptual models

Step 1 - create the conceptual model

Conceptual model - DAGs

DAG - Directed Acyclical Graph


from: Arif and MacNeil (2022). “Utilizing Causal Diagrams Across Quasi‐experimental Approaches.” Ecosphere 13 (4)

Conceptual model - DAGs

Via R packages dagitty and ggdag

dag <- dagitty('
dag {
  depth -> mpa
  depth -> fishing
  mpa -> fishing
  fishing -> biomass
  complexity -> mpa
  complexity -> biomass
  human -> fishing
  human -> biomass
  mpa -> coral
  biomass -> coral
  biomass[outcome, pos="3, 1"]
  fishing[pos = "2, 1"]
  mpa[exposure, pos = "1, 1"]
  depth[pos = "1, 2"]
  complexity[pos = "2, 2"]
  human[pos = "3, 2"]
  coral[pos = "2, 0"]
}
')
ggdag(dag) +
  geom_dag_point(color = "orange") +
  geom_dag_text(color = "black") +
  theme_dag_blank()

Step 2 - choose statistical test

Step 3 - test consistency of conceptual model

Test conceptual model

Assumptions:

  • depth \(\perp\!\!\!\perp\) human
  • complexity \(\perp\!\!\!\perp\) depth
  • mpa \(\perp\!\!\!\perp\) human

Test conceptual model

Assumptions:

  • depth \(\perp\!\!\!\perp\) human

  • complexity \(\perp\!\!\!\perp\) depth

  • mpa \(\perp\!\!\!\perp\) human

  • fishing \(\perp\!\!\!\perp\) complexity \(|\) mpa, depth

  • coral \(\perp\!\!\!\perp\) fishing \(|\) biomass, mpa

  • many more

Test conceptual model

dagitty::impliedConditionalIndependencies(dag) 
bmss _||_ dpth | cmpl, fshn, humn
bmss _||_ mpa | cmpl, fshn, humn
cmpl _||_ corl | bmss, mpa
cmpl _||_ dpth
cmpl _||_ fshn | dpth, mpa
cmpl _||_ humn
corl _||_ dpth | cmpl, fshn, humn, mpa
corl _||_ dpth | bmss, mpa
corl _||_ fshn | bmss, mpa
corl _||_ humn | bmss, mpa
dpth _||_ humn
humn _||_ mpa

Assumptions:

  • depth \(\perp\!\!\!\perp\) human

  • complexity \(\perp\!\!\!\perp\) depth

  • mpa \(\perp\!\!\!\perp\) human

  • fishing \(\perp\!\!\!\perp\) complexity \(|\) mpa, depth

  • coral \(\perp\!\!\!\perp\) fishing \(|\) biomass, mpa

  • many more

Test conceptual model

Simulated data

set.seed(123)
N <- 10000
depth <- rnorm(N, mean = 0, sd = 1)
human <- rnorm(N, mean = 0, sd = 1)
complexity <- rnorm(N, mean = 0, sd = 1)
mpa <- rbinom(N, 1, prob = plogis(0.2 * depth + 2.8 * complexity))
fishing <- rnorm(N, mean = -0.99 * mpa + -0.2 * depth + 0.3 * human, sd = 1)
biomass <- rnorm(N, -1.1 * fishing + -0.4 * human + 1.65 * complexity, sd = 1)
coral <- rnorm(N, mean = 0.5 * mpa + 2.5 * biomass, sd = 1)
dat <- data.frame(depth, human, complexity, mpa, fishing, coral, biomass)
head(dat)
        depth      human complexity mpa     fishing      coral    biomass
1 -0.56047565  2.3707252 -0.8362967   0 -0.06474014  -2.798193 -1.2848044
2 -0.23017749 -0.1668120 -0.2205730   1 -0.63374371   2.524107  0.1764197
3  1.55870831  0.9269614 -2.1035148   0  0.38843668 -10.319247 -4.1216673
4  0.07050839 -0.5681517 -1.6678075   0 -0.92777814  -3.268197 -1.7298705
5  0.12928774  0.2250901 -1.0979629   0  0.22189078  -3.114242 -1.5624274
6  1.71506499  1.1319859 -1.6656212   0  0.96233946  -7.754280 -3.2893050

Test conceptual model

tests <- localTests(x = dag, data = dat)
tests
                                            estimate    p.value         2.5%        97.5%
bmss _||_ dpth | cmpl, fshn, humn      -1.326104e-02 0.18491146 -0.032855508  0.006343624
bmss _||_ mpa | cmpl, fshn, humn        1.255555e-02 0.20938999 -0.007049193  0.032150653
cmpl _||_ corl | bmss, mpa             -2.162561e-03 0.82882998 -0.021763670  0.017440208
cmpl _||_ dpth                          2.000494e-02 0.04545055  0.000405027  0.039589492
cmpl _||_ fshn | dpth, mpa              4.665034e-05 0.99627879 -0.019555398  0.019648663
cmpl _||_ humn                         -3.067806e-03 0.75904469 -0.022666513  0.016533258
corl _||_ dpth | cmpl, fshn, humn, mpa -2.190013e-02 0.02855351 -0.041486319 -0.002297126
corl _||_ dpth | bmss, mpa             -2.557142e-02 0.01055610 -0.045150831 -0.005972383
corl _||_ fshn | bmss, mpa             -2.631970e-03 0.79244899 -0.022232854  0.016970936
corl _||_ humn | bmss, mpa             -7.609865e-03 0.44677002 -0.027207838  0.011993955
dpth _||_ humn                          6.021718e-03 0.54711505 -0.013579955  0.025618765
humn _||_ mpa                           8.364386e-03 0.40296780 -0.011237526  0.027959873

Test conceptual model

plotLocalTestResults(tests)

Step 4 - identify and adjust for biases

All paths

Three open paths between mpa and fish biomass

Path 1. frontdoor path (via a mediator, fishing)

Path 2. backdoor path (via a confounder, complexity)

Path 3. backdoor path (via a confounder, complexity)

No direct paths

Multiple closed paths

Biases

Criterion for avoiding biases:

  • Backdoor criterion: rules to block all (backdoor) paths
    • those that start with an arrow pointing into the exposue variable (MPA)

Biases

Criterion for avoiding biases:

  • Backdoor criterion: rules to block all (backdoor) paths that start with an arrow pointing into the exposue variable (MPA)

  • Frontdoor criterion: rules that enable us to go from the exposure to the outcome via a mediator in a way that avoids backdoors.

  • Use two paths (mpa -> fishing and then fishing -> biomass).
  • Multiple coefficients together.
  • useful when backdoor paths cannot be blocked.

Relation types

  • X indirectly effects Y through a mediator, M
  • Conditioning on M blocks the path, resulting in overcontrol bias
  • Backdoor criterion
    • not to condition on M
    • unless we want to estimate the direct effects of X on Y

Relation types

  • C (confounder) is a common cause of both X and Y
  • Not conditioning on C results in confounding bias
  • Conditioning on C blocks the path
  • Backdoor criterion
    • to condition on C

Relation types

  • C (collider) is a common effect of both X and Y
  • Conditioning on C results in collider bias
  • X and Y are independent unless we condition on C
  • Backdoor criterion
    • not to condition on C

Relation types

  • acts as a weaker form of either of the other relation types

Biases

  • Overcontrol bias: is caused when we condition on a mediator thereby blocking the path from the exposure to the outcome.

  • Confounding bias: is caused when a variable that is a common cause of both the exposure and the outcome (i.e. it is a backdoor path) is not included in a model.

  • Collider bias: is caused when a variable that is a common effect of the exposure and the outcome (i.e. it is a frontdoor path) is included in a model.

Relation types and biases

  • what paths are between MPA and fish biomass?
  • how do we close any backdoors?

Relation types and biases



My brain hurts, this all sounds difficult…

Relation types and biases

R packages dagitty and ggdag

  • Identify the paths
ggdag_paths(dag, from = "mpa", to = "biomass", text_col = "black") +
    coord_cartesian(expand = FALSE, xlim=c(0.5, 3.5), ylim=c(0.5,2.5)) +
    theme_dag_blank(panel.border = element_rect(fill = NA)) 

Relation types and biases

  • Path 1 is a pipe
    • do not condition on fishing

Relation types and biases

  • Path 1 is a pipe
  • Path 2 is a fork
    • do condition on complexity

Relation types and biases

  • Path 1 is a pipe
  • Path 2 is a fork
  • Path 3 is a fork
    • do condition on complexity

Relation types and biases

Adjustment sets - covariates required to close backdoors

dagitty::adjustmentSets(dag,
                        exposure = "mpa",
                        outcome = "biomass",
                        effect = "total"
                        )
{ complexity, depth }

Step 5 - fit the model

form <- bf(biomass ~ factor(mpa) + depth + complexity)
# OR better still
form <- bf(biomass ~ factor(mpa) + scale(depth) + scale(complexity))
form <- bf(biomass ~ factor(mpa) * scale(depth) * scale(complexity))

Relation types and biases

Adjustment sets - covariates required to close backdoors

set.seed(123)
ggdag_adjustment_set(dag,
                     exposure = "mpa",
                     outcome = "biomass",
                     effect = "total",
                     shadow = TRUE, text_col = "black"
                     ) +
  theme_dag_blank()

Relation types and biases

Total effect of Structural complexity on fish biomass

dagitty::adjustmentSets(dag,
                        exposure = "complexity",
                        outcome = "biomass",
                        effect = "total"
                        )
 {}

Step 5 - fit the model

form <- bf(biomass ~ complexity)

Relation types and biases

Direct effect of Structural complexity on fish biomass

dagitty::adjustmentSets(dag,
                        exposure = "complexity",
                        outcome = "biomass",
                        effect = "direct"
                        )
{ fishing, human }
{ depth, mpa }

Step 5 - fit the model

form <- bf(biomass ~ scale(complexity) + scale(fishing) + scale(human))
#OR
form <- bf(biomass ~ scale(complexity) + scale(depth) + factor(mpa))